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Summary 



We propose a novel model for generating graphs similar to a given example graph. 
Unlike standard approaches that compute features of graphs in Euclidean space, our 
approach obtains features on a surface of a hypersphere. We then utilize a von Mises- 
Fisher distribution, an exponential family distribution on the surface of a hypersphere, 
to define a model over possible feature values. While our approach bears similarity 
to a popular exponential random graph model (ERGM), unlike ERGMs, it does not 
suffer from degeneracy, a situation when a significant probability mass is placed on 
unrealistic graphs. We propose a parameter estimation approach for our model, and 
a procedure for drawing samples from the distribution. We evaluate the performance 
of our approach both on the small domain of all 8-node graphs as well as larger 
real-world social networks. 
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1 Introduction 

Increasingly, many domains produce data sets containing relationships that are conve- 
niently represented by networks, e.g., systems sciences (the Internet), bioinformatics 
(protein interactions), social domains (social networks). As researchers in these ar- 
eas are developing models and tools to analyze the properties of networks, they are 
hampered by few samples available to evaluate their approaches. This gives rise to a 
problem of generating more network samples that can be viewed as drawn from the 
same population as the given network. 

While there are a number of possible approaches to this problem, perhaps the 
most well-studied model is the exponential random graph models (ERGMs, or in 
social network literature, p*), an exponential family class of models, matching the 
statistics over the set of possible networks to the statistics of the network in question 
[e.g., 17]. This and similar approaches have a long history as they generalize the 
Pi [7] and Markov random graph [4] models first developed in the social network 
literature. While such approaches are intuitive and have nice properties, they also 
suffer from the issue of degeneracy [5, 16], which is manifested in the instability of 
parameter estimation, and in placing most probability mass of resulting distributions 
on unrealistic graphs (e.g., empty graph or complete graphs). As a result, these 
approaches are not suitable for the purpose of generating graphs similar to the given 
one. 

This paper contains two contributions. First, we zero-in on the issue of degener- 
acy and discover that its cause is related to the geometry of the set of feature vectors 
and the number of graphs mapped to each feature vector (thought of as feature vec- 
tor weights). By augmenting feature vectors with logarithms of their corresponding 
weights, we show that only graphs with such augmented feature vectors on the relative 
boundary of the resulting extended convex hull can become modes of any exponential 
random graph model, explaining why unrealistic graphs (which are on the relative 
boundary) often get large probability masses. Second, using the insight of the obser- 
vation above, we propose a novel random graph model which is based on embedding 
the features of graphs onto a surface of a hypersphere. Since a spherical surface is a 
relative boundary of the sphere's convex hull, all of the feature vectors would then 
belong to the relative boundary of the convex hull and could potentially serve as 
modes of the corresponding distributions. This in turn helps to avoid the degeneracy 
issues which plague ERGMs. 

Our proposed approach makes use of spherical features obtained by embedding 
possible graphs onto a surface of a sphere [18], and then approximating the distri- 
bution of the resulting spherical feature space with a von Mises-Fisher distribution. 
Since the space of all possible graphs is too large to consider for embedding, we con- 
sider determining the embedding function based only on the neighborhood around 
the given graph thus resulting in a locally spherical embedding of the set of graphs. 
The main benefit of our approach is that it fixes the issue of degeneracy, with the 
mode of the distribution over the spherical feature vector coinciding with the features 
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of the given graph. An additional advantage of our approach is that its parameter es- 
timation procedure does not require cumbersome maximum entropy approaches used 
with ERGMs. 

We start by revisiting the ERGM model and presenting insights on why this model 
often fails to generate realistic graphs (Section 2). We then propose our alternative 
approach, exponential locally spherical random graph model (ELSRGM, Section 3) and 
evaluate it on both synthetic and realistic graphs while comparing it to ERGMs 
(Section 4). We conclude the paper (Section 5) with ideas for future work. 

2 ERGMs 

Overall, we are interested in probabilistic approaches for generating graphs similar to 
the given one. We consider the case of simple (unweighted, no self-loops) undirected 
graphs G = (V, E) G Gn- 1 Where n — \V\ is the number of vertices; E G {0, i} nxn i s 
a symmetric binary adjacency matrix with zeros on its diagonal, with = 1 iff there 
is an edge from Vi to Vj, and = otherwise, where i>j G V and G E. There are 

\Q n \ = 2(2) possible labeled graphs with n vertices, a finite but often prohibitively 
large number even for fairly small n. 

We first consider the well-studied exponential random graph model (ERGM, also 
known as p*) as a starting point for our approach. 

2.1 ERGM Definition 

In the area of social network analysis, scientists are often interested in specific features 
of networks, and some of the state-of-the-art models explicitly use them to define 
functions of network sub-structures. We will denote the vector of these functions by 
f '■ Gn ^- d - Among the examples of such features used by social scientists, we have 
the number of edges, the number of triangles, the number of fc-stars, etc.: 

fedge{G) = ^ ^ e { j\ 
l<<i<j<n 

Ia(G) = ^^^eijCikejk; 

l<i<j<k<n 

n k 

a*(g) = E £-£ IK a) 

1=1 1 < n < ■ ■ ■ < ife j=l 
»1 yt i, . . . ,i k ^ i 

The subgraph patterns corresponding to features in equation (1) are shown in 
Figure 1. 

A commonly used probabilistic model over Q n is an exponential family model that 
uses the expectation of / as a vector of sufficient statistics. The distribution over Q n 

1 Our approach extends to directed graphs as well. 
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Figure 1: Simple typical subgraph configurations for undirected graphs. From left to 
right: edge, triangle, two-star and three-star configurations. 

can be parameterized in the form 

P(C|^ aP 'y ,G £S „ (2) 

where G M, d is the vector of natural model parameters, / (G) is a vector of features 
of G, and 

Z{6)= £exp<0,/(G)> 

Geg n 

is the partition function. 

Let V = {f (G) : G G Q n } denote the set of possible feature- vector values. Under 
ERGMs, the space Q n of possible graphs is coarsened into a much smaller set V of 
possible features. The distribution in (2) can also be viewed as a distribution over V 
with all of the graphs mapped to the same value of / assigned the same probability 
mass. Let w (as) = \{G G Q n : f (G) — x}\ be the number of graphs corresponding to 
a feature value x G V. (We will refer to w (cc)'s as weights.) The distribution P (G\0) 
can also be extended to the set V, 

J? ] (3) 
Z(0) = 22w{x)exp(0,x) , 

xev 

also an exponential family distribution. We let C C ]R d denote a convex hull of P. 
Since ^ n is finite, so would be V, and C would be a polytope in M. d . Using a well- 
known result from the theory of exponential families [1] , if x* G lR d is a vector of the 
sufficient statistics (e.g., for a single example graph G*, x* = f (G*)), a maximum 
likelihood estimate (MLE) (x*) satisfies 

E p(G\6( x *))f (°) = E p( x \e{x*)) x = x *> ( 4 ) 

and it exists, and the corresponding distribution P ^-\0 (x*)j is unique as long as 

x* G rint(C), the relative interior of the convex hull of the set of possible feature 
vectors V. In essence, ERGMs are designed to preserve mean features of the observed 
graph, a very intuitive and often desirable property. However, it is also well-known 
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that the estimation of 8 (x*) is an extremely cumbersome task, complicated by the 
fact that the exact calculation of Z (0) is intractable, and approximate approaches 
(e.g., pseudo-likelihood, MCMC) are employed instead [9]. 

2.2 Degeneracy in ERGMs 

ERGMs often suffer from the problem of degeneracy [5]. There are two types of 
degeneracy usually considered. The first type occurs when the MLE estimate (x*) 
either does not exist, or the MLE estimation procedure does not converge due to 
numerical instabilities [5, 16]. 

The second type of degeneracy happens when 6 (x*) can be reliably estimated, but 
the resulting probability distribution places significant probability mass (or virtually 
all of probability mass) on unrealistic graphs, e.g., empty or complete graphs. This 
type of degeneracy can be considered from another viewpoint; that is the mode of 
ERGM corresponding to the 6 (x*) may be placed on x G V very different from 
x*. This is an undesirable property as there is little justification for placing large 
probability mass over a region away from observed feature vectors while placing little 
mass over the observed example. 

For an illustration, consider the set Q% (the set of all possible simple undirected 
graphs with 8 nodes) with feature vectors / (G) = (fedge (G) , /a (G)). In this case, 
Figure 2 displays (left plot) the support space V consists of 228-(edge,triangle) pair- 
statistics for all possible 8-node undirected graphs. The right plot of Figure 2, displays 
a diameter distribution highlighting that graphs with different topologies map to 
same edge-triangular feature pairs. The diameter was computed by observing that 
each feature-pair could be viewed as a cell of graphs with same feature count. This 
can be accomplished by first computing a perturbation graph whose nodes are all 
non-isomorphic graphs of 8-nodes. A graph edit distance is applied on the set Q$ 
containing 12346-non isomorphic graphs (computed using nauty [13]). To compute 
the diameter for each cell given the perturbation graph, one only has to identify the 
graphs mapping to that cell and extract the maximum number of edges for any pair of 
graphs with their feature counts mapping to that cell. Table 1 shows the complexity 
of small sized graph spaces. 

Table 1: Complexity of graph spaces for fixed number of nodes. 



n number of edges all graphs are 2^ 2 ' non-isomorphic 



7 


21 


2,097,152 


1,044 


8 


28 


268,435,456 


12,346 


9 


36 


68,719,476,736 


274,668 


10 


45 


35,184,372,088,832 


12,005,168 


11 


55 


3,602,879,701,896,397 


1,018,997,864 


12 


66 


7,378,697,629,483,821,000 


165,091,172,592 
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Figure 2: left: Distribution of 8-node graphs, right: Distribution of cell diameter; 
where we consider each feature-pair as a cell and compute the graph edit distance 
between each pair of graphs with feature counts mapping to that cell (feature-pair). 

Figure 4 shows the support V for the distribution over feature- vectors (all circles), 
and its convex hull C (boundary in green). Only a small subset of the example feature 
pairs x* = f (G*) result in distributions over V with modes coinciding the example 
feature vector x* (red discs with black borders). ERGMs estimated from other feature 
vectors will thus generate graphs with different features from the example graph as 
shown in Figure 3. 

It is this behavior that we are most concerned with and are trying to address in 
this paper, and from now on when we mention degeneracy, unless otherwise noted, 
we mean the second type of degeneracy. 

2.2.1 Approaches to Fixing Degeneracy in ERGMs 

Recent approaches in literature have focused on a more flexible specification of the 
model in equation (3). This has been achieved by a mixed set of feature statistics 
which includes the node attributes of a given graph [6]. Instead of using only the 
structural properties (edges, triangles etc.) of the graph, other attributes (e.g., gen- 
der, race, age, etc) are included as covariates for the model. This approach, however, 
does not address degeneracy in general as one has to know what set of features and 
attributes to choose in attempting to minimize degeneracy [6]. Such a task demands 
domain knowledge to accurately specify a reasonable model. Another approach makes 
use of the geometrically weighted edgewise shared partner, the geometrically weighted 
dyadic shared partner, and the geometrically weighted degree network statistics as 
new statistics for the model in (2). These specifications when parameterized have 
been shown to lead to curved exponential families [9]. However, the difficulty with 
this approach is not only in the parameter estimation of the resulting curved exponen- 
tial model, but also on having to avoid other graph features that may be dependent 
to these specifications as that will result in degeneracy [8]. 
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Figure 3: A degenerate ERGM specified by edge-triangle pair for an 8-node graph(£op 
figure). Colorcoded is the pmf over the edge-triangle space for an estimated MLE 
Qmle = (—0.992,0.617). "+" indicates the observed feature and its mean, while the 
"o" shape indicates the ERGM mode. 



2.3 Why Are Unrealistic Graphs Likely under ERGMs? 

It may not be entirely surprising that the probability mass over support V does 
not concentrate around the expected value of P as the mode and the mean of the 
exponential family distributions do not always match. This does not however imply 
that the mode is placed on features corresponding to unrealistic graphs. To investigate 
the degeneracy further, we first introduce the following result from convex analysis. 

Theorem 1. Suppose C C M. d is a full- dimensional bounded convex polytope with a 
finite set of vertices V = {x 1 , . . . , x L ). Then for any e R d \ {0} , 

Me = argmax (8, x) C rbd(C) , 
xec 

where by rbd(C) we denote the relative boundary of the convex hull C. 
Proof. Since C is a convex hull of V, for all x G C 

L L 

x = \ix\ for some Ai, . . . , \l > 0, A/ = 1. (5) 
i=i i=i 
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Figure 4: Solid points show mode placement on the edge-triangle feature pairs that 
form the vertices of the extended hull as a result of Theorem 1. Overlayed on solid 
points are modes obtained from using the estimated 9 m i e of equation (3). 



For any G M. d \ {0}, let Ve = argmax^gy (0, x), and let cie = max^y (0, x). Then 

V x e C, \/x e e V e 

I L \ L L 

(0, x ) = l0,j2 v ) = J2 Xi < > xl ) ^ E Xi ^ x °) 

\ i=i I i=i i=i 

= (0,x ) = a . 

Then {x G M. d : (0, x) < a^} is a supporting hyperspace, and % (a#) = {a; G M. d : (0, x) 
is a supporting d— 1 dimensional hyperplane for C. Since C is a full-dimensional convex 
polytope, C (]L % (ae), and therefore, H (ag) is a proper supporting hyperplane. Thus 
rint(C) nH(a ) = (e.g., [2], Thm. 4.1). Observing that M e C % (a ) completes 
the proof. □ 

We now consider the formulation in (3), from which after taking the log-likelihood 
we observe that 

argmaxlnP {x\0) = argmax {(0, x) + lnu> {x)} . (6) 

x&v xev 

We extend V to include the weight w. Let 

V = {(x, In w (x)) : x G V} be the extended set of features, with C being the resulting 
extended convex hull for V and V the set of vertices for C. Then 

(0,x) + lnw(x) = ((0,1) ,(x,lnw(x))) . 
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Figure 5: View of mode placement on facets of the extended 3D convex hull. Each 
mode forms a vertex of a given triangle (denoted by black lines) for a facet of the 
convex hull. 



By Theorem 1, only the points on the boundary rbd (C) can maximize (6). Thus to 
find the set of possible modes of (3), we can restrict our attention only to 
M = {x : (x,\nw(x)) G V}. 

Consider an illustration in Figure 5 for the case of all 8-node graphs. Only points 
on the boundary of the polytope C can potentially be modes of any ERGM specified 
on the feature set in Figure 4 (points on the boundary are denoted by solid black 
discs and solid red discs with black borders on both plots). Figure 4 further reveals 
that only a small number of points on the relative boundary of C correspond to actual 
observed modes for equation 3 (ERGMs with parameters 6 (a;*) from (4)). 2 There are 
two possible reasons for this occurrence. One, some of the points in V correspond to 
= (0,t) with t 7^ 1 (where t G M, is a scalar augmenting the parameter vector), so 
not all of M. may be the modes of (3). Two, the MLE solution 6 (x*) may lie outside 
of the cone [16] 

\ G R d : x* G argmax (0, x) \ 

I x£C ) 

of parameters for which x* is the mode. 3 

The above observations also explain why considering a curved exponential model 

The solid red discs in Figure 4 handles the case where modes of the maximum likelihood distribution are not 
unique. 

Often, feature vectors corresponding to the empty and to the complete graphs have particularly large cones, 
with many 6 (x*) values falling within these cones, where x* is the feature maximizing (6). This explains why many 
ERGMs place large probability masses on these degenerate graphs. 
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[8] 

P (x\0) = 0*0 exp (g(e),x), xeV 

does not rectify the issue of degeneracy as curving the space of parameters does not 
change the geometry of the feature space. 4 

Finally, we note that this type of degeneracy is not restricted to distributions over 
statistics of finite graphs. Similar issues can in general arise with exponential family 
distributions on a bounded support owing to the fact that the exponential family 
distributions are designed to match the mean statistics and not to concentrate the 
probability mass around the mode. Consider an illustration in Figure 6 for a 20 x 20 
grid of uniformly spaced points; fitted is the exponential family model p(x\6) oc 
exp(0,£c). As evident from the plots, the estimated model exhibit the degeneracy 
issue described above. 
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Figure 6: An illustration of degeneracy of exponential models on non-graph data. 
Colorcoded is the pmf over 2D grid spaces for estimated MLEs (left plot) 6 MLE = 
(-0.086,0.086) & (right plot) MLE = (0.120,-0.160). "+" sign indicates the ob- 
served feature and its mean, while the "o" shape indicates ERGM mode. 



We observed the same ERGM behavior in separate experiments on other types of features spaces e.g. Edge-vs- 
2Star, Edge-vs-GWED, Triangle-vs-GWESP (GWED: geometrically weighted degree distribution, GWESP: geomet- 
rically weighted edgewise shared partners [9].) 
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Algorithm 1 Outline of ELSRGM Procedure 

Require: Given an example graph G* G Q n and a feature vector function / : Q n — >■ 

R d : 

1: Generate a neighborhood B C Q n by performing a random walk in £ n starting at 
G*. 

2: Compute the set of statistics V {&). 

3: Compute a mapping e : R d — > § p_1 to embed V (B) onto S^ -1 , a surface of a 
sphere in IR P . 

4: Estimate the parameters for the von Mises-Fisher density / over the space of 
hyper-spherical features EP -1 . 

5: Approximate / with a density /, a mixture of kernels centered around the spher- 
ical features corresponding to graphs in B and recompute B when new sample 
graphs are discovered. 



3 Exponential Locally Spherical Random Graph 
Model 

The main result of section 2 provides us with a very important insight, that is; type II 
degeneracy in ERGMs is due to the bounded nature of discrete exponential models. 
The formulation of such models is sensitive to the geometry of the support space V . 
The geometry of the convex hull on V defines which points are likely to have most or 
all of the probability mass placed on them. If x £ rbd(C) then a model computed 
for x will place very little probability mass on x, while most of the mass is placed 
on some point (i.e. mode) x* G rbd(C). This result suggests that mapping all points 
x G V onto a surface that belongs to its relative boundary and defining a model in 
the new space would solve the degeneracy issue. In the following sections, we propose 
mapping all observed features onto a spherical surface §> p_1 since every point on it 
belongs to rbd(C(S p_1 )). We then define a distribution and sampling techniques for 
graphs over the resulting feature space. 

3.1 Algorithm 

In this section we describe the new model for graph sampling, one that can be 
used to generate non-degenerate graphs similar to the given one. Since the approach 
we are proposing is based on an exponential family model over the locally spherical 
embeddings of graph features, we will call it an Exponential Locally Spherical Random 
Graph Model, or ELSRGM for short. Our approach is executed in several steps as 
outlined in Algorithm 1. 

First, we sample the neighborhood B C Q n around the given example graph G*, 
and then compute the set V (B) — {f (G) : G G £>} of feature vectors corresponding 
to the graphs in the neighborhood. If the space of graphs (Q n or its subset) is small, 
then it may be possible to consider all graphs in the set. Otherwise, the neighborhood 
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Algorithm 2 Outline of Spherical Embedding 

Require: Given dissimilarity matrix D nxn , with n number of graphs. 
1: If the spherical point positions are given by X iy i = 1, • • • , n, then (X^Xj) = 

r 2 cos Pij, with Pi, = d -f. 
2: If X in unknown, compute for X such that XX T = Z, where Zj 3 - = r 2 cos (3ij and 

dij G D. Find the radius of sphere as r* = argmin r Ai{Z(r)}. Where Ai is the 

smallest eigenvalue of Z. 
3: Set Z = £ and X* = arg min^T^i \\XX T - Z\\ 

4: Decompose Z, Z = UAU T . Set the embedding positional matrix to be X* = 

1 /2 

U nX kA k ' xk , where is chosen such that the elements of U nx t corresponds to the 
largest k eigenvalues of Akxk- 



of G* is sampled by a random walk which at each step considers graph one edge 
deletion/insertion away from the current graph. 

The resulting feature set is then embedded in ap- 1 dimensional unit hypersphere 
§ p_1 C W by a linear mapping e : R d — > § p_1 C R p . We are using the spherical em- 
bedding approach of [18], Algorithm 2), in which the mapping is chosen to minimize 
the Frobenius matrix distance between the normalized dissimilarity matrix (in our 
case, matrix of Euclidean distances between feature vectors in V (B)) and the matrix 
of the Euclidean outer product (in W) for the vectors in {e (cc) : x e "P (£>)}. We 
denote the set of resulting spherical features by S (B) = {e (f (G)) : G G B}. 5 One of 
the beneficial properties of such embedding is that it preserves neighborhood proper- 
ties, i.e., transformed feature vectors close to each other are mapped to vectors which 
are also close to each other. For our case, the embedding is locally spherical as we 
determine the mapping based only on a small subset of possible observed graphs, and 
determining the spherical coordinates for the rest by recomputing the embedding iter- 
atively based on the uncovering of new candidate graphs for the neighborhood B. As 
a result, the distance preserving property may hold only for the graph neighborhood 
on which the transformation was estimated. 

The next step is to estimate a distribution over all possible values in § p_1 of 
the spherical features. We propose to use von Mises-Fisher directional distribution 
(denoted in the rest of the paper as VMFD) with a pdf 

2-1 

fvmfd(y\fJ>, k) = —— g- — exp (nn T y) , y G S p_1 (7) 

(27r)3Jp_i(K) 

where fj, is the location parameter, k > is the concentration parameter, and Ik 
denotes the modified Bessel function of the first kind and order k. VMFD is a 
member of the exponential family; unlike a general exponential family distribution, it 
is symmetric with y = fi serving as both its mean and mode. Parameter k determines 
how concentrated the density is around the mode. When k — 0, VMFD corresponds 



We refer to the spherically embedded graph feature vectors as coordinates of the graph. 
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to the uniform distribution over the hyper-sphere § p_1 , while as k — > oo, VMFD is 
concentrated at the point fi. See [12] for more details on VMFD. As we wish the 
example graph G* to be the mode for the distribution over possible graphs, we set the 
mode of VMFD to ft — e (f (G*)). Using MLE, one can compute for k from S (B) 

with already set location parameter fii.e. k m i e = 2(i-p, T J2 P ~e(/(G))/|5(g)|) • Given 

only a single example graph, the concentration parameter is undefined and also, k m \ e 
can be observed to be a function of the random-walk-based sampling procedure, which 
is arbitrary. An alternative is to treat k as a user-defined parameter controlling how 
concentrated the region around the example graph is. As a result, one obtains a 
distribution with density 

/(y) = /vm/d(y|£»«) (8) 

over possible spherical feature vectors y G S p_1 . 

We note that one can employ a different family of distributions other than VMDF 
for the task of modeling spherical data. Other candidates provide more degrees of 
freedom, but also more difficult parameter estimation methods [12]. We leave the 
investigation of this aspect of our algorithm for future work. 

3.2 Sampling under ELSRGM 

Using the ideas and steps from the previous subsection, we estimate a probability 
density function over a unit sphere on the domain of possible spherical feature values 
for graphs. Before proceeding further, however, we need to resolve several issues. 
First, we are interested in a discrete probability distribution over a very large but 
finite set of possible features S (B). How would one obtain such a distribution P 
from the VMFD density? Second, what portion of the mass of / is to be associated 
with each of the coordinates for possible graphs G G Q n (or y G S (B))l Third, it is 
infeasible to enumerate all of the possible graphs in Q n , and it may be infeasible to 
consider all possible spherical feature vectors. How can one perform sampling in Q n 
so that the resulting spherical features are distributed according to P? The following 
subsections will outline our approach to resolving these issues. 

3.2.1 Density as an Approximation to the Smoothed Distribution 

As a first step, consider a setting where all possible spherical feature vectors S (B) 
can be enumerated for a fixed number of nodes n. We propose to approximate VMFD 
using a mixture of kernels with one mixture component for each graph's spherical co- 
ordinates. (Alternatively, one can consider one mixture component for each possible 
spherical feature vector in S (B).) Intuitively, we consider the density / to be a base- 
line approximation to the distribution obtained by smoothing a discrete distribution 
over spherical feature values S (B). More formally, for a set of graphs Q, we will index 
its elements G±, . . . ,Gk with K = \Q\, and denote the spherical feature vector for 
graph G k byy k = e (f(G k )) G S (B) . 
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Assume / (y) is the estimated density in Equation (8). Let 



K 



(9) 



k=i 



with hk (y) = fvmfd {y\yk, K h k ), where y& is the location for kernel hk, and all of the 
kernels share a user defined concentration (bandwidth) Kh k chosen to assign more 
weight to points closer to y k - The parameters tt = {ir k : k — 1, . . . , K}, n k > 0, 
XlfcLi — 1 are probabilities associated with each y*.. 

The estimation of the 7r's is carried out by optimizing with respect to the Kullback- 
Leibler (KL) information: 



where A is the Lagrange multiplier. It can be shown that the objective function 
J (ir) in (10) is convex, and it can be minimized efficiently using standard convex 
optimization techniques. 

3.2.2 Metropolis-Hastings Algorithm 

The goal is to draw samples from Q n according to the probability mass {tti, . . . , 71r}, 
where K = \Q n \. If the number of nodes n is small (< 12), random graphs can be 
enumerated using nauty ([13]) and it is possible to identify all possible feature values 
to compute probability masses 7Ti, .. . , ttk associated with each graph. Graphs can 
then be sampled directly according to the resulting multinomial distribution. When 
all possible graphs of n-nodes (for n < 10) are observed, an alternative is to make use 
of the Metropolis-Hastings approach as outlined in Algorithm 3. However, in practice, 
the number of nodes is usually too large to explicitly consider all possible graphs, and 
the initial neighborhood B would include only a small portion of all graphs. We 
propose an approach that will allow us to draw samples from the distribution over 
graphs including the graphs in Q n \ B. 

For better understanding, we first propose a graph generation approach assuming 
n is small, and all graphs G\, . . . ,Gk can be enumerated. We will draw a point 
in the embedding space, and then employ Markov Chain Monte Carlo (MCMC) 
approach to draw a graph "corresponding" to this point by constructing a Markov 
chain in the space Q n of graphs. The pseudocode is presented in Algorithm 3. If we 
could draw samples directly from /, this procedure would be equivalent to drawing 
a vector y* ~ /, and then trying to identify out which of the K components was 
used to generate y* by using MCMC in the posterior over k = {1, . . . , K}. The 
only approximation employed in this case is that instead of drawing y* ~ /, we are 
drawing y* ~ /, but according to the KL-divergence measure (Figure 8), / and / are 




(10) 



very close. 
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Algorithm 3 MCMC Algorithm for Sampling Graphs 



Require: / is given. 
1: Draw a sample y* ~ /. Set t — 0, and perturb the example graph G G Q n to 

generate a random graph G° G Q n to initialize the chain. 
2: repeat 
3: t<-t + l. 

4: Sample a proposal graph G' from the proposal distribution at time t, 
QiG'lG 1 - 1 ). 

5: Compute ratio, r = p^r^g^) 



where P(G = G k \y) oc 7r fe /i fc (y) . 

G' with prob min(r, 1) 
G 1 - 1 o.w. 
7: until convergence of the chain 



6: G t 



To sample from the von Mises-Fisher distribution / we follow the approach out- 
lined in [19]. For proposal distribution Q we consider a uniform distribution 
over graphs one edge insertion/deletion away from In our experiments on small- 
and large-sized graphs, with the number of iterations set to T = 1000, the algorithm 
is observed to converge and does produce graphs with topologies that resemble the 
observed graph. 

For larger graphs, however, we cannot compute explicitly ir k for all K graphs. 
This situation can be thought of as similar to the case of countably infinite number 
of objects in which a Dirichlet process can be employed to assign some weight to 
graphs that are not yet observed. It is as if, / is approximated by a countably infinite 
mixture / (y) = YlkLi n kh k (y), he., assuming that the number of graphs is countably 
infinite instead of just very large. In this case, we are employing Dirichlet process 
mixture model DP (a, Go), where a is a concentration parameter and Go is a uniform 
distribution u p (fi) oc 1, n G S p_1 for the kernel location since assume that each yet 
unobserved set of features is equally likely. 

Assuming that K different graphs have been observed, equation 9 then becomes 

K 

Ea 
K k h k (y) + -rr- — u p (y) 
K + a 

k=l 

Algorithm 4 details the pseudocode for sampling of small to large graphs on a 
spherical space. For basic information on Dirichlet process mixtures and their infer- 
ence see [14]. 

4 Experimental Evaluation 

We adopt the common goodness of fit measures in network generation studies to in- 
vestigate how well our model fits the observed data (i.e. by comparing the observed 
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Algorithm 4 MCMC-DP Algorithm for Sampling Graphs 

Require: / is given. Graphs G 1: . . . , G L initially observed. Set K = L. Estimate ir 
by minimizing (10). 

1: Draw a sample y* ~ /. Set t = 0, and perturb 6 the example graph G G Q n to 

generate a random graph G° G Q n to initialize the chain. 
2: repeat 
3: t<-t + l. 

4: Sample a proposal graph G' from the proposal distribution at time t, 
QiG'lG*- 1 ). 

tr n + +• — P(G'\y*)Q(G t - 1 \G') 

5: compute ratio, r — p(G*- 1 |y*)Q(G'|G*- 1 ) 
6: where 

ir k h k {y) k — l,...,K, 

■j^u p (y) G = new. 



P(G = G fc |y)oc 



c + r»t _ / ^" wit/i pro6 min(r, 1) 
7; oet G — s 

1^ G 1 1 o.w. 

8: If G* is new, then set K = K + 1, G K = G*. 

• Compute /(G x ), set B = B U /((?*-). 

• Re-compute 5 (B) (to generalize to new samples) and re-estimate 
7Ti, . . . , 7T^. 

9: until convergence of the chain 



statistics with a range of the same statistics obtained from simulating many networks 
using the fitted model) [9]. If the observed network is not typical of the simulated 
networks for a particular measure then the model is either degenerate or simply a 
misfit. We first consider the degree distribution, which is defined as the statistics: 
Do, Di, ■ ■ ■ , -D n -i, with each Di representing the number of nodes with i edges con- 
nected to them, divided by n. Secondly, we compute the edgewise shared partner 
distribution, which is defined as the statistics: EP , EPi, ■ ■ ■ , EP n _ 2 , with each EP { 
representing the number of edges in the graph between two nodes that share exactly 
i neighbors in common, divided by the total number of edges. Thirdly, we compute 
the triad census distribution, which is the proportion of 3 — node sets having 0, 1, 2, 
or 3 edges among them. Lastly, we compute the minimum geodesic distance distribu- 
tion; which is the proportion of pairs of nodes whose shortest connecting path is of 
length k, for k — 1, 2, • • • . Also, pairs of nodes that are not connected are classified 
as k = oo. 

In all our experiments, we define each feature vector to be, 

/ (C) — {fedge, /b*, " " " , /(ll)*, /a) 

These features have been observed to be among the set of subgraph patterns that 
capture social interactions and network formation in real processes [4]. 
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4.1 Small graphs 

We first consider the experiments for Q n with n = 8 nodes. Figure 7 (left) shows a two 
component 8-node synthetic graph, Gt es t\ , while Figure 8 displays the i^L-divergence 
KL (j || f^j computed by first observing 300 8-node graphs (graph-edit distance < 3 

of neighborhood around G testl ), and allowing the MCMC-DP (Algorithm 4) to discover 
and fill-in the neighborhood set £>, with the baseline VMFD centered at the coor- 
dinates of Gtesh- Gtesti was chosen to test our hypothesis for the extended feature 
space, that is if x* G rint(C') e.g. x* — f (Gtesti) G rint (C) , then the corresponding 
ERGM will be degenerate in relation to the analysis of Theorem 1. G test2 is used 
to test our second hypothesis, i.e. if x* G rbd(C') then the ERGM specified by x* 
is most likely to generate realistic graphs. In Figure 7 (right) we show an 8-node 
synthetic graph G t est 2 whose extended feature vector (x* = f (G test2 ) G rbd (£)). 

For the experimental set-up, the bandwidth for each kernel hk is treated as a user 
defined parameter and is set to Kh k = 400, while the DP prior parameter is set to 
a = 0.5. Parameters for the VMFD are set as discussed in section 3, with k fixed 
at 140 to control the concentration of fvM f d ■ We compute the parameters for the 
ERGM model P and simulations using the statnet package [6]. 

Figure 9, depicts 100 simulated summary results for f{G te st 1 ) G rint (C). The 
result obtained from our ELSRGM (in unshaded blue box plots), displays no sign of 
degeneracy while the ERGM result shows summary statistics that appear to relate 
to those of empty and complete graphs- a sign of degeneracy. Figure 10, shows 
summary statistics of simulated networks obtained for the ERGM and the ELSRGM 
models when f(Gt es t 2 ) G r bd (C). The results show no sign of degeneracy from both 
the specified ERGM and the proposed ELSRGM . This confirms our second hypothesis, 
that is having extended feature vectors lie on the relative boundary of the convex hull 
enables the generation of realistic graphs from exponential family models. 

4.2 Larger Graphs 

For larger data sets, we first consider a undirected Dolphin social network with 62 
nodes [3, 11]. We apply the MCMCDP approach as outlined in Algorithm 4, by sampling 
graphs via a random walk for 1000 steps starting from a random network G° G B. 
Figure 11 summarizes the results of 100 simulations for the Dolphin network from 
the two approaches. The proposed ELSRGM shows a relatively better performance of 
capturing the distribution of statistics of the Dolphin network. It is appears that 
ERGM model specified by simple statistics is incapable of generating the distribution 
of statistics that resembles those of the observed network. The lack of fit by an ERGM 
model in the degree distribution, geodesic distance, and shared partners distribution 
indicates the presence of degeneracy. 

We next evaluate ELSRGM on 205-node Faux-Mesa-High social network [6, 15]. 
We again apply the MCMC-DP algorithm by sampling graphs via a random walk for 
1000 steps starting from a random network G° G B. Figure 12 depicts the results of 
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Figure 7: Two 8-node synthetic test graphs. Left: 2-component graph G tes ti falling 
inside the relative interior of the convex hull C of extended features. Right: G test 2 
falling on the relative boundary of the convex hull C for the extended feature space. 
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nGraphs 



Figure 8: Sample i^L-divergence KL \\ f j for ELSRGM as more graphs get uncov- 
ered. The initial neighborhood B <Z Q n (n = 8) is built around G tes ti- 



100 simulated networks from the specified ELSRGM and ERGM models. Again, we 
observe that ELSRGM generates distributions of statistics that resemble those of the 
observed network while ERGMs shows signs of misspecification. 

Finally, we assess ELSRGM's performance on a benchmark data set from social 
network analysis, which is used in testing whether a model can overcome the degen- 
eracy phenomenon. We consider the first matrix (39 nodes) of the Kapferer's tailor 
shop data [10]. The result of 100 simulated network summaries shown in Figure 13 
suggests that the proposed ELSRGM does not suffer from degeneracy. ERGM results 
were not included for this data set due to severe unstable estimation of MLE using 
simple specifications considered in our analysis (corresponding to the first type of 
degeneracy) . 

5 Conclusion and Future Work 

In this paper, we investigated the cause of degeneracy in ERGMs, and explained the 
degeneracy as related to the feature vectors belonging to the relative interior of the 
convex polytope of feature values. To correct this issue, we proposed an algorithm that 
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Figure 9: (a) Synthetic graph G te sti] (b) Corresponding feature-pair: f(G testl ) G 
rint (C) ; (c) Simulations from models given G tes tx • The observed statistics are in- 
dicated by the solid lines; the box plots include the median and interquartile range 
of simulated networks. ERGMs show low variance and all 100 samples seem to be 
placed on the same network- sign of degeneracy. 

uses spherical features as points on a sphere would belong to the relative boundary of 
the corresponding convex set rather than its relative interior. Based on this mapping, 
we outlined a novel model for graphs (ELSRGM) and an approach to sampling graphs 
from this model. In several synthetic and real-world social network, our approach 
generated graphs with statistics similar to that of the example graphs while not 
suffering from the issue of degeneracy (unlike ERGMs). 

ELSRGM opens up a new class of graph sampling models: those based on spherical 
features. In this paper, we made several modeling choices, e.g., von Mises-Fisher 
distribution for spherical density, kernel approach assigning mass to individual feature 
vectors; other modeling choices could also lead to models preserving properties of the 
example graphs, and they need to be investigated. The insight from the geometric 
interpretation of the feature vectors realizable as modes may also lead to other types 
of features (non-spherical) which can lead to models generating realistic graphs. 
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Figure 10: (a) Synthetic graph G test2 ; (b) Corresponding feature-pair: f(G test2 ) G 
rbd (C); (c) Simulations from models given G test2 . The observed statistics are indi- 
cated by the solid lines; the box plots include the median and interquartile range of 
simulated networks. Both models are non-degenerate. 

6 Acknowledgements 

The authors thank Okan Ersoy, S.V.N. Vishwanathan, and Richard C. Wilson for 
helpful discussions and suggestions. This research was supported by the NSF Award 
IIS-0916686. 



References 

[1] O. Barndorff-Nielsen. Information and exponential families in statistical theory. 
Wiley, New York, 1978. 

[2] A. Br0ndsted. An Introduction to Convex Polytopes. Springer- Verlag, 1983. 



Degeneracy and Exponential Locally Spherical Random Graph Model 



20 




edge-wise shared partners minimum geodesic distance degree triad census 



Figure 11: Dolphin 62-node network summaries. The observed statistics are indi- 
cated by the solid lines; the box plots include the median and interquartile range of 
simulated networks. ERGM plots shows signs of degeneracy effect. 




Figure 12: Faux-Mesa-High 205-node network summaries. The observed statistics 
are indicated by the solid lines; box plots summarize the statistics for the simulated 
networks. ERGM plots shows signs of degeneracy effect. 




edge-wise shared partners minimum geodesic distance degree triad census 



Figure 13: Kapferer's 39-node network summaries. The observed statistics are indi- 
cated by the solid lines; box plots summarize the statistics for the simulated networks 
using ELSRGM. 
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Figure 14: left: Faux-Mesa-High 205-node original network, right: ELSRGM generated 
network. 



